One of the relatively easy optimizations available is to use an up-to-date version of R. In general, R is very conservative, so upgrading doesn’t break existing code. However, a new version will often provide free speed boosts for key functions.
The version command returns a list that contains (among other things) the major and minor version of R currently being used.
# Print the R version details using version
Warning messages:
1: In readChar(file, size, TRUE) : truncating string with embedded nuls
2: In readChar(file, size, TRUE) : truncating string with embedded nuls
3: In readChar(file, size, TRUE) : truncating string with embedded nuls
version
_
platform x86_64-pc-linux-gnu
arch x86_64
os linux-gnu
system x86_64, linux-gnu
status
major 4
minor 0.0
year 2020
month 04
day 24
svn rev 78286
language R
version.string R version 4.0.0 (2020-04-24)
nickname Arbor Day
# Assign the variable major to the major component
major <- version$major
# Assign the variable minor to the minor component
minor <- version$minor
Keeping R up to date also makes it easier to get help when you are stuck.
One of the most common tasks we perform is reading in data from CSV files. However, for large CSV files this can be slow. One neat trick is to read in the data and save as an R binary file (rds) using saveRDS(). To read in the rds file, we use readRDS().
Note: Since rds is R’s native format for storing single objects, you have not introduced any third-party dependencies that may change in the future.
To benchmark the two approaches, you can use system.time(). This function returns the time taken to evaluate any R expression. For example, to time how long it takes to calculate the square root of the numbers from one to ten million, you would write the following:
system.time(sqrt(1:1e7))
user system elapsed
0.154 0.047 0.202
# How long does it take to read movies from CSV?
system.time(read.csv("movies.csv"))
user system elapsed
0.284 0.004 0.289
# How long does it take to read movies from RDS?
system.time(readRDS("movies.rds"))
user system elapsed
0.051 0.000 0.052
Reading in RDS files are much quicker than reading in CSV files.
There are a number of ways to assign variables to objects. The two standard ways are to use the = or <- operators. Using the <- operator inside a function call will create a new (or overwrite an existing) object.
This makes ‘<-’ useful inside system.time() since we can store the result.
Using system.time() is convenient, but it does have its drawbacks when comparing multiple function calls. The microbenchmark package solves this problem with the microbenchmark() function.
# Load the microbenchmark package
library(microbenchmark)
# Compare the two functions
compare <- microbenchmark(read.csv("movies.csv"),
readRDS("movies.rds"),
times = 10)
# Print compare
compare
Unit: milliseconds
min <dbl> | lq <dbl> | mean <dbl> | median <dbl> | uq <dbl> | max <dbl> | neval <dbl> | |
---|---|---|---|---|---|---|---|
283.70181 | 287.35011 | 299.52388 | 290.25690 | 297.31405 | 376.59949 | 10 | |
50.72125 | 50.81986 | 52.10398 | 51.16195 | 52.31819 | 58.71417 | 10 |
The microbenchmark() function makes it easier to compare multiple function calls at once by compiling all the relevant information in a data frame. It does this by running each function call multiple times, recording the time it took for the function to run each time, then computing summary statistics for each expression as you can see here.
When benchmarking it’s important to consider both absolute and relative times. Using the timings below, on average a single call to read.csv() is 720 - 80 = 640 milliseconds slower than that of readRDS(). Aprox, 9 times slower.
For many problems your time is the expensive part. If having a faster computer makes you more productive, it can be cost effective to buy one. However, before you splash out on new toys for yourself, your boss/partner may want to see some numbers to justify the expense. Measuring the performance of your computer is called benchmarking, and you can do that with the benchmarkme package.
# Load the benchmarkme package
library(benchmarkme)
# Assign the variable ram to the amount of RAM on this machine
ram <- get_ram()
ram
32.6 GB
# Assign the variable cpu to the cpu specs
cpu <- get_cpu()
cpu
$vendor_id
[1] "GenuineIntel"
$model_name
[1] "Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz"
$no_of_cores
[1] 16
The benchmarkme package allows you to run a set of standardized benchmarks and compare your results to other users. One set of benchmarks tests is reading and writing speeds.
The function call
res = benchmark_io(runs = 1, size = 5)
records the length of time it takes to read and write a 5MB file.
# Run the io benchmark
res <- benchmark_io(runs = 1, size = 5)
Preparing read/write io
# IO benchmarks (2 tests) for size 5 MB:
Writing a csv with 625000 values: 0.57 (sec).
Reading a csv with 625000 values: 0.228 (sec).
# Plot the results
plot(res)
You are ranked 15 out of 135 machines.
# Run each benchmark 3 times
Warning message:
In readChar(file, size, TRUE) : truncating string with embedded nuls
res <- benchmark_std(runs = 3)
# Programming benchmarks (5 tests):
3,500,000 Fibonacci numbers calculation (vector calc): 0.603 (sec).
Grand common divisors of 1,000,000 pairs (recursion): 0.772 (sec).
Creation of a 3,500 x 3,500 Hilbert matrix (matrix calc): 0.424 (sec).
Creation of a 3,000 x 3,000 Toeplitz matrix (loops): 1.25 (sec).
Escoufier's method on a 60 x 60 matrix (mixed): 0.924 (sec).
# Matrix calculation benchmarks (5 tests):
Creation, transp., deformation of a 5,000 x 5,000 matrix: 0.673 (sec).
2,500 x 2,500 normal distributed random matrix^1,000: 0.501 (sec).
Sorting of 7,000,000 random values: 0.793 (sec).
2,500 x 2,500 cross-product matrix (b = a' * a): 1.27 (sec).
Linear regr. over a 5,000 x 500 matrix (c = a \ b'): 0.124 (sec).
# Matrix function benchmarks (5 tests):
Cholesky decomposition of a 3,000 x 3,000 matrix: 0.887 (sec).
Determinant of a 2,500 x 2,500 random matrix: 0.919 (sec).
Eigenvalues of a 640 x 640 random matrix: 0.468 (sec).
FFT over 2,500,000 random values: 0.351 (sec).
Inverse of a 1,600 x 1,600 random matrix: 0.806 (sec).
plot(res)
You are ranked 258 out of 749 machines.
Growing a vector is one of the deadly sins in R; you should always avoid it.
The growing() function defined below generates n random standard normal numbers, but grows the size of the vector each time an element is added!
Note: Standard normal numbers are numbers drawn from a normal distribution with mean 0 and standard deviation 1.
n <- 30000
# Slow code
growing <- function(n) {
x <- NULL
for(i in 1:n)
x <- c(x, rnorm(1))
x
}
# Use <- with system.time() to store the result as res_grow
system.time(res_grow <- growing(n))
user system elapsed
1.444 0.000 1.449
That was really slow for such a small vector. Next you’ll see ways to speed up that code.
In the previous exercise, growing the vector took around 2 seconds. How long does it take when we pre-allocate the vector? The pre_allocate() function is defined below.
# Fast code
pre_allocate <- function(n) {
x <- numeric(n) # Pre-allocate
for(i in 1:n)
x[i] <- rnorm(1)
x
}
# Use <- with system.time() to store the result as res_allocate
system.time(res_allocate <- pre_allocate(n))
user system elapsed
0.051 0.024 0.074
Pre-allocating the vector is significantly faster than growing the vector!
The following piece of code is written like traditional C or Fortran code. Instead of using the vectorized version of multiplication, it uses a for loop.
x <- rnorm(10)
x2 <- numeric(length(x))
for(i in 1:10)
x2[i] <- x[i] * x[i]
Your job is to make this code more “R-like” by vectorizing it.
# Store your answer as x2_imp
x2_imp <- x * x
The new version of the code is quicker to type, easier to understand, and it runs faster!
A common operation in statistics is to calculate the sum of log probabilities. The following code calculates the log-sum (the sum of the logs).
# x is a vector of probabilities
total <- 0
for(i in seq_along(x))
total <- total + log(x[i])
NaNs producedNaNs producedNaNs producedNaNs producedNaNs produced
# Initial code
n <- 100
total <- 0
x <- runif(n)
for(i in 1:n)
total <- total + log(x[i])
# Rewrite in a single line. Store the result in log_sum
log_sum <- sum(log(x))
log() can take in a vector and outputs a vector. sum() takes in a vector and returns the sum of all the values in the vector.
All values in a matrix must have the same data type, which has efficiency implications when selecting rows and columns.
# Which is faster, mat[, 1] or df[, 1]?
microbenchmark(mat[, 1], df[ ,1])
expr min lq mean median uq max neval
mat[, 1] 1.098 1.286 1.49122 1.3695 1.556 6.631 100
df[, 1] 6.064 6.321 7.11863 6.5265 6.791 55.580 100
Because all values in a matrix object must be the same data type, it is much faster to access the first column of a matrix than it is to access that of a data frame.
Using microbenchmark(), how long does it take to select the first row from each of these objects?
To make the comparison fair, just use mat[1, ] and df[1, ].
# Which is faster, mat[1, ] or df[, 1]?
microbenchmark(mat[1, ], df[ ,1])
Unit: microseconds
expr min lq mean median uq max neval
mat[1, ] 4.122 4.5695 11.69504 9.305 18.2145 36.885 100
df[1, ] 4397.701 4530.5430 4837.12408 4589.192 4700.0570 7809.431 100
Accessing a row of a data frame is much slower than accessing that of a matrix, more so than when accessing a column from each data type. This is because the values of a column of a data frame must be the same data type, whereas that of a row doesn’t have to be.
# Load the data set
data(movies, package = "ggplot2movies")
# Load the profvis package
library(profvis)
Registered S3 method overwritten by 'htmlwidgets':
method from
print.htmlwidget tools:rstudio
# Profile the following code with the profvis function
profvis({
# Load and select data
comedies <- movies[movies$Comedy == 1, ]
# Plot data of interest
plot(comedies$year, comedies$rating)
# Loess regression line
model <- loess(rating ~ year, data = comedies)
j <- order(comedies$year)
# Add fitted line to the plot
lines(comedies$year[j], model$fitted[j], col = "red")
}) ## Remember the closing brackets!
One of the parts of the code that profvis highlighted was the line where we generated the possible dice rolls and stored the results in a data frame:
df <- data.frame(d1 = sample(1:6, 3, replace = TRUE),
d2 = sample(1:6, 3, replace = TRUE))
We can optimize this code by making two improvements:
This gives:
m <- matrix(sample(1:6, 6, replace = TRUE), ncol = 2)
# Use microbenchmark to time m() and d()
Warning messages:
1: In readChar(file, size, TRUE) : truncating string with embedded nuls
2: In readChar(file, size, TRUE) : truncating string with embedded nuls
3: In readChar(file, size, TRUE) : truncating string with embedded nuls
microbenchmark(
data.frame_solution = d(),
matrix_solution = m()
)
Unit: microseconds
expr <fctr> | min <dbl> | lq <dbl> | mean <dbl> | median <dbl> | uq <dbl> | max <dbl> | neval <dbl> |
---|---|---|---|---|---|---|---|
data.frame_solution | 121.415 | 123.4675 | 129.08917 | 125.2945 | 127.2285 | 291.692 | 100 |
matrix_solution | 4.887 | 5.7210 | 7.41577 | 7.2775 | 8.1565 | 27.494 | 100 |
The second bottleneck identified was calculating the row sums.
total <- apply(d, 1, sum)
In the previous example we switched the underlying object to a matrix. This makes the above apply operation three times faster. But there’s one further optimization we can use - switch apply() with rowSums()
# Compare the methods
Warning messages:
1: In readChar(file, size, TRUE) : truncating string with embedded nuls
2: In readChar(file, size, TRUE) : truncating string with embedded nuls
3: In readChar(file, size, TRUE) : truncating string with embedded nuls
microbenchmark(
app_sol = app(rolls),
r_sum_sol = r_sum(rolls)
)
Unit: microseconds
expr <fctr> | min <dbl> | lq <dbl> | mean <dbl> | median <dbl> | uq <dbl> | max <dbl> | neval <dbl> |
---|---|---|---|---|---|---|---|
app_sol | 14.951 | 15.3215 | 17.10399 | 15.5905 | 16.503 | 59.484 | 100 |
r_sum_sol | 3.168 | 3.4220 | 4.00829 | 3.7045 | 3.935 | 22.644 | 100 |
To determine if both dice are the same, the move_square() function uses if statements.
if (is_double[1] & is_double[2] & is_double[3]) {
current <- 11 # Go To Jail - square 11 == Jail
}
The & operator will always evaluate both its arguments. That is, if you type x & y, R will always try to work out what x and y are. There are some cases where this is inefficient. For example, if x is FALSE, then x & y will always be FALSE, regardless of the value of y. Thus, you can save a little processing time by not calculating it. The && operator takes advantage of this trick, and doesn’t bother to calculate y if it doesn’t make a difference to the overall result.
In this code, if is_double[1] is FALSE we don’t need to evaluate is_double[2] or is_double[3], so we can get a speedup by swapping & for &&.
One thing to note is that && only works on single logical values, i.e., logical vectors of length 1 (like you would pass into an if condition), but & also works on vectors of length greater than 1.
is_double = c(FALSE, TRUE, TRUE)
Warning messages:
1: In readChar(file, size, TRUE) : truncating string with embedded nuls
2: In readChar(file, size, TRUE) : truncating string with embedded nuls
3: In readChar(file, size, TRUE) : truncating string with embedded nuls
is_double
[1] FALSE TRUE TRUE
Warning messages:
1: In readChar(file, size, TRUE) : truncating string with embedded nuls
2: In readChar(file, size, TRUE) : truncating string with embedded nuls
3: In readChar(file, size, TRUE) : truncating string with embedded nuls
# Define the previous solution
move <- function(is_double) {
if (is_double[1] & is_double[2] & is_double[3]) {
current <- 11 # Go To Jail
}
}
# Define the improved solution
improved_move <- function(is_double) {
if (is_double[1] && is_double[2] && is_double[3]) {
current <- 11 # Go To Jail
}
}
# microbenchmark both solutions
# Very occassionally the improved solution is actually a little slower
# This is just random chance
microbenchmark(move, improved_move, times = 1e5)
Unit: nanoseconds
expr <fctr> | min <dbl> | lq <dbl> | mean <dbl> | median <dbl> | uq <dbl> | max <dbl> | neval <dbl> |
---|---|---|---|---|---|---|---|
move | 22 | 25 | 26.03437 | 25 | 26 | 5319 | 1e+05 |
improved_move | 21 | 23 | 24.05335 | 24 | 24 | 5890 | 1e+05 |
The parallel package has a function detectCores() that determines the number of cores in a machine.
# Load the parallel package
library(parallel)
# Store the number of cores in the object no_of_cores
no_of_cores <- detectCores()
# Print no_of_cores
no_of_cores
[1] 16
The following piece of code implements a simple dice game. The game is as follows:
total <- no_of_rolls <- 0 # Initialise
while(total < 10) {
total <- total + sample(1:6, 1)
if(total %% 2 == 0) total <- 0 # If even. Reset to 0
no_of_rolls <- no_of_rolls + 1
}
no_of_rolls
[1] 3
Do you think this algorithm can be (easily) run in parallel? No: it’s a sequential algorithm. The ith value depends on the previous value. press
If we wrap the game in a function:
play <- function() {
total <- no_of_rolls <- 0
while(total < 10) {
total <- total + sample(1:6, 1)
# If even. Reset to 0
if(total %% 2 == 0) total <- 0
no_of_rolls <- no_of_rolls + 1
}
no_of_rolls
}
And construct a loop to play the game
results <- numeric(100)
for(i in seq_along(results))
results[i] <- play()
results
[1] 16 5 8 8 69 8 5 10 8 6 17 21 16 14 3 5 5 6 9 2 10 2 8
[24] 17 3 29 4 21 10 7 10 15 6 28 19 17 9 8 55 7 8 16 29 7 9 6
[47] 4 25 12 29 40 54 21 7 14 4 17 3 10 13 16 7 5 30 12 10 15 36 10
[70] 20 11 7 11 11 12 61 13 27 68 28 9 19 4 3 3 4 5 32 15 20 8 2
[93] 17 17 6 18 10 3 14 5
results <- numeric(100)
for(i in seq_along(-results))
results[i] <- play()
results
[1] 4 12 16 21 5 59 63 10 3 11 10 9 43 3 21 51 4 13 15 67 3 49 18
[24] 14 22 8 17 29 13 5 10 21 22 3 11 20 10 11 14 14 16 13 11 4 21 8
[47] 18 26 4 16 37 39 31 15 11 14 4 17 6 28 7 7 6 6 36 2 9 8 18
[70] 2 23 6 4 34 11 7 12 19 20 18 20 3 4 6 10 27 34 31 4 5 13 5
[93] 18 14 5 3 16 18 7 2
Yes, this is embarrassingly parallel. We can simulate the games in any order. press Independent Monte Carlo simulations are ideal for parallel computation.
Sometimes parApply() is faster; it just depends on the problem.
To run code in parallel using the parallel package, the basic workflow has three steps.
The simplest way to make a cluster is to pass a number to makeCluster(). This creates a cluster of the default type, running the code on that many cores.
dd <- readRDS("dd.rds")
The object dd is a data frame with 10 columns and 100 rows. The following code uses apply() to calculate the column medians:
apply(dd, 2, median)
To run this in parallel, you swap apply() for parApply(). The arguments to this function are the same, except that it takes a cluster argument before the usual apply() arguments.
# Determine the number of available cores
detectCores()
[1] 16
# Create a cluster via makeCluster
cl <- makeCluster(2)
# Parallelize this code
parApply(cl, dd, 2, median)
[1] 0.01375112 -0.07879187 0.02337513 -0.02880553 -0.05578824
[6] -0.07316231 -0.14109027 -0.25877556 -0.01040861 0.05407892
# Stop the cluster
stopCluster(cl)
We previously played the following game:
The game could be simulated using the play() function:
play <- function() {
Warning messages:
1: In readChar(file, size, TRUE) : truncating string with embedded nuls
2: In readChar(file, size, TRUE) : truncating string with embedded nuls
3: In readChar(file, size, TRUE) : truncating string with embedded nuls
total <- no_of_rolls <- 0
while(total < 10) {
total <- total + sample(1:6, 1)
# If even. Reset to 0
if(total %% 2 == 0) total <- 0
no_of_rolls <- no_of_rolls + 1
}
no_of_rolls
}
To simulate the game 100 times, we could use a for loop or sapply():
res <- sapply(1:100, function(i) play())
This is perfect for running in parallel!
To make functions available on a cluster, you use the clusterExport() function. This takes a cluster and a string naming the function.
clusterExport(cl, "some_function")
library("parallel")
# Create a cluster via makeCluster (2 cores)
cl <- makeCluster(2)
# Export the play() function to the cluster
clusterExport(cl, "play")
# Re-write sapply as parSapply
res <- parSapply(cl, 1:100, function(i) play())
# Stop the cluster
stopCluster(cl)
clusterExport() makes it easy to run your functions inside a cluster.
Running the dice game is embarrassingly parallel. These types of simulations usually (but not always) produce a good speed-up. As before, we can use microbenchmark() or system.time(). For simplicity, we’ll use system.time() in this exercise.
# Set the number of games to play
no_of_games <- 1e5
## Time serial version
system.time(serial <- sapply(1:no_of_games, function(i) play()))
user system elapsed
7.008 0.014 7.048
## Set up cluster
cl <- makeCluster(2)
clusterExport(cl, "play")
## Time parallel version
system.time(par <- parSapply(cl, 1:no_of_games, function(i) play()))
user system elapsed
0.039 0.009 7.066
## Stop cluster
stopCluster(cl)